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Abstract 

Consider an n x p data matrix X whose rows are independently sampled from a population with 
covariance E. When n,p are both large, the eigenvalues of the sample covariance matrix are substan¬ 
tially different from those of the true covariance. Asymptotically, as n,p —>■ oo with p/n —>■ 7, there 
is a deterministic mapping from the population spectral distribution (PSD) to the empirical spectral 
distribution (ESD) of the eigenvalues. The mapping is characterized by a fixed-point equation for the 
Stieltjes transform. 

We propose a new method to compute numerically the output ESD from an arbitrary input PSD. 
Our method, called Spectrode, finds the support and the density of the ESD to high precision; we prove 
this for finite discrete distributions. In computational experiments it outperforms existing methods by 
several orders of magnitude in speed and accuracy. We apply Spectrode to compute expectations and 
contour integrals of the ESD. These quantities are often central in applications of random matrix theory 
(RMT). 

We illustrate that Spectrode is directly useful in statistical problems, such as estimation and hy¬ 
pothesis testing for covariance matrices. Our proposal may make it more convenient to use asymptotic 
RMT in aspects of high-dimensional data analysis. 


1 Introduction 


Large data matrices are now commonly analyzed in science and engineering. Models from random matrix 
theory (RMT) are becoming increasingly used to understand the behavior of popular statistical methods on 
such matrices. RMT is particularly applicable to analyze statistical methods which depend on the sample 
covariance matrix of the data: for instance principal component analysis (PCA), classification, hypothesis 
testing of high-dimensional means, and independence tests, see e.g. the monographs [^[^[^|^. 

Concretely, consider an nxp matrix X, whose rows are independent and identically distributed random 
vectors. Suppose that are mean zero, and their covariance matrix is the p x p matrix S = Ex^x^. To 
estimate S, we form the sample covariance matrix S = 7 t,“^X^X. In the asymptotic model classically used 
in statistics, when p is fixed and n —> 00, the sample covariance matrix is a good estimator of the population 
covariance |^. 

However, if n and p are of comparable size, then S deviates substantially from the true covariance. 
The asymptotic theory of random matrices describes the behavior of the eigenvalues of S as n,p grow 
large proportionally, see . If the distribution of the eigenvalues of S tends to a limit population spectral 
distribution (PSD) as n,p —>■ 00 and the aspect ratio p/n —>■ 7, then under mild conditions the random 
eigenvalue distribution of S also tends to a deterministic limit empirical spectral distribution (ESD) [^[^. 

The “fundamental theorem of applied statistics” is the statement that often the limit of the empirical 
distribution function is the population distribution. This theorem applies in numerous settings, see e.g. [^, 
but not here. When n —>■ 00 but p/n —>■ 7 > 0 , the limit empirical spectrum differs from the true spectrum, 
because the number of samples is only a constant multiple of the dimension. This is very different from 
the case where p is fixed and n —> 00, in which case the sample spectrum converges to the true spectrum. 
The difference between the population and empirical eigenvalues has fundamental implications for high¬ 
dimensional statistical inference, see e.g. 10 , It becomes central to high-dimensional statistical analysis 
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(b) Mean, median, and mode of the limit ESD. 


Figure 1: Boxcar -|- point mass mixture example: Spectrode computes (a) the density of the limit ESD, normalized to have 
maximum equal to one for display purposes, and (b) its mean, median, and mode as a function of 7. 


to understand the relationship between the population and sample eigenvalues. This understanding should 
help adjust classical statistical methods to the high-dimensional setting. 

However, the relationship between the population and sample spectrum is complex, implicit and non¬ 
linear; it is described by a fixed-point equation - often called the Marchenko-Pastur equation or Silverstein 
equation - for the Stieltjes transform of the limit ESD. As a consequence, the ESD is not available in closed 
form, except for very special cases. The implicit description of the sample spectrum can be somewhat hard 
to understand, as well as hard to use in any practical setting, including data analysis. 

Reliable, precise, and efficient computational methods are needed to understand the relationship between 
the population and sample eigenvalues. Perhaps surprisingly, research focused on delivering robust software 
tools for numerically computing large classes of limit ESDs has received relatively little attention. While 
there are important contributions to related problems (see Section]^, none of them are fully suitable for our 
problem. 

The main method for computing the limit ESD is a fixed-point algorithm (FPA) which directly iterates 
the Silverstein equation. Since the algorithm is immediately suggested by the fixed-point characterization of 
the ESD, the history of this algorithmic approach is apparently lost in the prehistory of the subject. FPA 
has appeared recently in various forms, e.g. 11 12 14 15 . Further, FPA is recommended as the default 
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method for computing the ESD in the monograph M. FPA is a good method for computing the density of 
the ESD at a single point. However, usually the density of the ESD must be computed on a dense grid on 
the real line. When this is the case, we show that FPA is inefficient for high-precision computations. 

We propose the new method Spectrode to compute the limit empirical spectrum of covariance matrices 
from the limit population spectrum. Spectrode improves on FPA by exploiting the smoothness of the 
ESD. We show in computational experiments on dense grids that our new method is dramatically faster and 
more accurate than FPA and other methods. For instance, on natural test problems in Section [273] below, 
Spectrode is up to 1000 times faster than FPA while achieving the same accuracy! Finally, for atomic 
PSDs, i.e. weighted mixtures of point masses, we prove its convergence to the correct answer. 

Spectrode is publicly available in an open source Matlab implementation at https: //github. com/ 
dobriban/eigenedge. This software package also has the code to reproduce all computational results of our 
paper (see Section]^. 

In the remainder of the introduction, we showcase example computations with Spectrode, and highlight 
the key aspects of the method. Then we state its properties more precisely. 


1.1 Two examples 

We illustrate Spectrode by computing two spectral quantities of interest. In this example the PSD is an 
equal mixture of two components: (1) a mixture of ten point masses at 2,3,..., 11, with weights forming 
an arithmetic progression with step r = 0.005 as follows: 0.0275,0.0325,..., 0.0725; and (2) a uniform 
distribution - or a ‘boxcar’ - on [0.5,1.5], with mixture weight 1/2. The weights sum to one. We use the 



























aspect ratio 7 = 0 . 01 . 

Figure shows the example computations. In subplot (a), we show the key output of Spectrode, 
the density of the limit ESD. The computation takes 0.5 seconds on a desktop computer. A priori it is 
not obvious how many disjoint clusters there are in the ESD, or what their shape is. Several insights can 
be derived from the computation: there are 11 clusters in total, so all population clusters separate. Each 
population cluster in the PSD, in this instance, creates a distinct component of the ESD. Eurther, the two 
rightmost clusters nearly touch; and the height of the clusters decreases while the width of the point-mass 
clusters increases. We are not aware of any other method besides Spectrode from which these properties 
can be derived with comparable speed. 

As a second example, in subplot (b) we compute three functionals of the ESD as a function of 7 : the 
mean, median, and the mode. Such functionals are important in statistical applications: for instance, the 
median is used for optimal singular value shrinkage in 16 ; see also Section 4.1 As expected, the mean does 


not depend on 7 . However, the behavior of the median and the mode is not obvious. Using Spectrode, 
one can get insight into their behavior: the mode decreases as a function of 7 , and the median is greater 
than the mode. 


1.2 Highlights 

To summarize and expand on the above argument, we highlight the following aspects of our method. 

1. It provides ready access to a wide variety of new examples of limit spectra of covariance matrices. There 
has been, until now, no convenient tool for precisely calculating the ESD for such large collections of 
examples. 

2. Spectrode computes to high precision several important functionals of the limit empirical spectrum, 
namely: 


(a) 


The edges of the support: The edges are of substantial interest in the study of phase transitions 
in spiked covariance models, for instance in 17 , and in designing optimal singular value shrinkers 


for matrix denoising, for instance in 18 


(b) 


Moments of the ESD: We show that general moments Ei;’[/i(A)] can be computed conveniently 
with Spectrode. The polynomial moments Ei?[A*] can be computed alternatively via challenging 
free probability calculations, see 19 . However, this does not hold for more general moments 


Ei:’[h(A)] for arbitrary h. Therefore, our method could simplify and extend the applicability of 
existing techniques by providing a unified way to compute nearly all global spectral moments of 


(c) 


interest (Section 4.1). 

Contour integrals of the ESD’s Stieltjes transform: Contour integrals of the Stieltjes 
transform appear crucially in the central limit theorem for linear spectral statistics (LSS) of 
covariance matrices due to Bai and Silverstein [^. In applications of this powerful result to 
multivariate statistics, calculating the contour integral formulas for the mean and the variance is 
a key step, see e.g. [^. These moments are known in closed form only in a few cases. The current 
approach is to calculate them using residue theory from complex analysis. This analytic approach 
can require substantial effort, and is limited to the cases where the ESD is known in closed form. 
Spectrode enables us to compute such contour integrals numerically instead (Section |4.2[ ). High 
precision numerical results may suffice in many applications. 

3. Spectrode is directly useful in statistical applications. We give two key examples where Spectrode 
could, in our view, significantly improve on current statistical methodology: 


(a) 


Estimating the covariance matrix S: A problem of considerable interest in statistics is to 
estimate the unobserved covariance matrix S based on the observed data. When the number of 
samples n is comparable to the dimension p, covariance estimation is a challenging problem. 

A recent series of methods due to Ledoit and Wolf 21 22 assumes that one can accurately 


compute the ESD for any proposal PSD. It repeatedly invokes this ability as the ‘engine’ driving 
its core iteration. Unfortunately, the whole framework is limited by the accuracy of the ESD 











computation. Spectrode allows to immediately upgrade this procedure, replacing the existing 
low-precision ESD computations with high-precision ones. 


(b) 


Hypothesis tests on the covariance matrix: Testing statistical hypotheses on the covariance 
matrix can be approached by using the CLT for the linear spectral statistics, see e.g. 23 4j. 


As discussed above, we suggest here that the mean and variance in the CLT could be computed 
numerically (Section |4. 2 [ ). Our approach, implemented with open source software, might be signif¬ 
icantly more convenient than traditional analytic calculations, compare 23 In addition, it may 
lead to entirely new test statistics, whose analysis was not possible via pre-existing methodology. 


There may be of course many other ways that our efficient computational framework will be useful to 
the statistics and engineering communities. 


1.3 Properties of Spectrode 
1.3.1 Background 


To state precisely the properties enjoyed by Spectrode, we first set up the formal background. A more 
thorough presentation will be given in Section below. Consider a sequence of problems indexed by p, 
with deterministic p x p covariance matrices Sp. Let Hp be the distribution of eigenvalues of Sp, i.e. 
Ti,... ,Tp be the eigenvalues of Sp, and Hp the discrete distribution with cumulative distribution function 
Hp{x) = p~^ ^ ^)- l"or eachp, draw Up independent samples x^p from a distribution whose covariance 
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matrix is Sp. The samples are of the form x^p = Sp' y^p, where yip is a p-dimensional random vector with 
independent and identically distributed, mean zero, variance one entries. 

Arrange the vectors x^p into the rows of the n x p data matrix Xp. Form the sample covariance matrix 
Sp = np“^Xp Xp. Let Fp be the distribution of the p eigenvalues of Spi thus Ai,..., Ap are the eigenvalues 
of Sp, and Fp is discrete distribution with cumulative distribution function Fp{x) = p~^ ^ ^)- 

Consider the high-dimensional limit where n,p —>■ oo such that p/up —>■ 7. Suppose the eigenvalue 
distributions Hp converge to a limit population spectral distribution (PSD) H, i.e. Hp H in distribution. 
Then a cornerstone result in random matrix theory, the Marchenko-Pastur theorem, states that the empirical 
eigenvalue distributions Fp also converge, almost surely, to a limit empirical spectral distribution (ESD) F 


We consider the computation of F from H. The method we propose is general and well-defined for all 
population spectral distribut ions H. Our analysis considers atomic PSDs H, which are finite mixtures of 
point masses, but see Section 3.4 for the extension to general distributions. Thus we assume H = wAi ■ 
where 5t is the point mass at t, Wi > 0 are the component masses with Wi = 1, and ti > 0 are 
the corresponding population eigenvalues. We exclude the case 7 = 1 for technical reasons, specifically the 
potentially unbounded density of the ESD at a: = 0. 

In pioneering work, Silverstein and Choi 24 study the limit ESD corresponding to general H in depth. 
They show that the limit ESD F has a continuous density f{x) for a; ^ 0. The density f{x) exists at a: = 0 if 
7 < 1, but not if 7 > 1. Instead F has a point mass of weight I — 7 “^ at a: = 0. For atomic distributions, it 
follows from the results in 24 that the distribution is supported on a union of K disjoint compact intervals 
[IkyUk], where Ik is the lower endpoint and Uk is the upper endpoint of the fc-th interval for 1 < k < K. 
The endpoints are such that 0 < li < ui < ... < Ik < uk- The number of sample intervals K is at most 
the number of population components J. If the aspect ratio 7 = limp/n is sufficiently close to 1, then some 
population components can “merge” in the sample spectrum, and K < J will occur. Finally, it is shown in 


24 that / is analytic in the neighborhood of all points where the density is positive. 


1.3.2 Input and output of Spectrode 

Given the aspect ratio 7 , a population spectrum H (for instance an atomic distribution) and a user-specified 
precision control parameter e > 0, Spectrode produces numerical approximation of F consisting of: 

1. The number of intervals in the support of F: K = K{e). 

2. The endpoints of the support intervals [lk{s),Uk{s)], ioi k = 1,K. 










3. The density f{x,e) for all real x. 

For the reader’s convenience, the input and output of Spectrode is summarized in Table 

Table 1: Input and Output of Spectrode 


Spectrode: Input and Output 

Input: 

H -(r- population spectrum 

(e.g. atomic measure: eigenvalues ti,... ,tj and masses wi,..., wj) 
7 ^ aspect ratio 
e ■<— precision control parameter 

Output: 

K{£) number of intervals in the support 

[/^(e:), •<— endpoints of intervals in the support 

f{x,£) density of the spectrum, for any x 


1.3.3 Correctness of Spectrode 

Our main theoretical results, given in Section]^ below, demonstrate the correctness of our proposed method. 
As the user-specified precision control parameter e —>■ 0, Spectrode has the following performance charac¬ 
teristics: 

1 . Correctness of the number of disjoint intervals of the support: 

]imKie)=K. (1) 


2 . Accuracy of the endpoints of the support: 

lim Zfc(e) = Ik, and lim Mfc(e) = Uk- (2) 

£->0 £->0 


3. Accuracy of the densit-^ 

lim sup |/(a:,£) -/(a;)| = 0. 

^“*■0 a : GR \{ 0 } 


(3) 


Claims 0-© are proved in Theorem |3.3[ while claim (|3|) is proved in Theorem |3.6[ The claims are 
verified in reproducible computational experiments in the next section (see also Section [^. 

As a consequence of these results, we show in Corollary |4.1| that the moments of the limit ESD can 
be accurately computed by integrals against the approximated density. Finally, we adapt Spectrode to 


compute contour integrals involving the Stieltjes transform of the limit ESD in Section 4.2 


The computational framework used by Spectrode is applicable to general population distributions H, 
not just atomic distributions. Indeed, we already showed an example involving a uniform distribution in 
Figure [2 However, our current software implementation of Spectrode assumes that H is a finite mixture 
of uniform distributions and point masses. Moreover, the proof of convergence that we supply in this paper 
only holds for atomic distributions. Therefore, we will work with atomic distributions through most of this 


paper. This issue is further discussed in Section 3.4 


In the rest of the paper, we validate our claims with computational experiments in Sectionj^ Spectrode 
and its convergence in presented in Section]^ After giving some applications and extensions in Section]^ we 
describe related literature in Section The available software and the tools to reproduce our computational 
results are described in Section |6l 


more precise statement is: For 7 < 1, the convergence is uniform over all a; S M. For 7 > 1, the density does not exist at 
fc = 0, but is equal to f{x) = 0 on some intervals X = (—5, 0) U (0, (5), with <5 > 0. Then, the convergence is uniform over the 
closed set M \ X. 










2 Computational Results 


In addition to theoretical correctness results, we validate our performance claims from the introduction by 
computational experiments. We present supporting evidence for claims Q and® on the correctness of the 
support in Section 2.1 for claim (|^ on the correctness of the density in Section 2.2 and for the computational 
efficiency of Spectrode in Section [23| The experiments are reproducible (see Section]^. 


2.1 Correctness of the support 

2.1.1 The comb model 

To show that our algorithm identifies the support (Claims we consider the following comb model 

for eigenvalues. Here the eigenvalues and the weights are each defined in terms of arithmetic progressions 

H = (« + jb)Sc+jd- 

The eigenvalues are placed at c + jd, for some c > 0 and d S M such that c + jd > 0 for all j. They have 
weights a + jb for some a,b > 0. The constants a, b are constrained so that the sum of the weights is one, 
thus only one of them, say 5, is a free parameter. This is a flexible model governed by only three parameters. 

The comb model is useful for gaining insight into the support identification problem. Interesting behavior 
occurs as a function of the problem parameters J, b, c, d and 7 . For instance, let us change 7 while fixing 
all other variables. If 7 —>■ 0, then F~ 


H 


24 


intuitively the number of samples is much larger than the 
dimension, n ^ p, so the ESD converges to the atomic population SD. Now as 7 increases, the sharp atoms 
spread out into density bumps. 

If the original atoms are sufficiently close to each other, then at some point bumps will start merging. 
The precise moment when this happens is in general a complicated function of J, 6 , c, d and 7 , but can be 
determined precisely with Spectrode. Hence we provide a useful tool for understanding and exploring 
support identification. 


2.1.2 Testing our method 

Since in most cases there is no closed form for the density, we compare our algorithm against the Fixed- 
Point Algorithm (FPA); see its description preceding Lemma 3.8 FPA is empirically slow for dense grid 
evaluation (Section |2.3[), but converges, as shown in a more general setting in 13 


Since the convergence rate is not known, one cannot guarantee the exact accuracy of FPA. We have 
validated FPA separately on simpler test cases where the closed form expression was known (data not 
shown). 

More specifically, our numerical test has the following framework: For given problem parameters, and an 
accuracy control parameter e, we run Spectrode to produce numerical approximations K{s), his), 

The method also returns a dense grid of Xi. On this grid we compute the density approximations f{p(xi, Eq) of 
FPA, with an accuracy control parameter Eq. Here the parameter Eq is smaller than e, so that the fixed-point 
algorithm’s solution can be reliably used as a basis of comparison for e-accurate computations. 

We then define the gold standard approximation to the support as the connected components of the grid 
Xi where the density ffp{xi,Eo) > Eq- This step thresholds the density at level Eq, because FPA was tuned 
to have accuracy of the order Eq, its control parameter. This prescription produces approximations Kfp{Eo), 
kp,k{£o), Tfp,fe(eo), which we use to evaluate Spectrode. 

We evaluate Spectrode by calculating the error in the number of clusters: A^(e) = \K{e) — iFfp(eo)l) 
where £9 is suppressed for brevity. For the support endpoints we proceed similarly. If the number of intervals 
is not computed correctly, then we set this error to -l-oo. Even ii K = K, we have to take into account the 
finite precision of the grid Xi . 

Consider a lower endpoint for one of the clusters. Suppose the two methods return the grid elements Xi 
and Xj, with i < j, as numerical approximations for the lower endpoint. Due to finite grid precision, \xi —Xj\ 
can be an understimate of the actual error. For instance if Xi = Xj, it’s clear that our accuracy bound 
cannot, in general, be better than the size of grid spacings \xi — Xi-i\, \xj — Xj+i\. Generally, a conservative 
estimate of the accuracy can be obtained by adding these neighboring grid spacings to \xi — Xj\ to get (recall 
Xi-i <Xi< Xj < Xj+i)\ A(.(e) = \xi-i - Xj+i\. 









Ax(£): error in the number of clusters. 


A^(£:), Au(£): error in the cluster endpoints. 


Figure 2: Spectrode (a) correctly identifies the number of disjoint intervals in a comb model; and (b) accurately computes the 
lower and upper endpoints. 


Finally the approximation error for lower endpoints is defined as the average of all errors for lower end- 
points: Ai{e) = ^he approximation error for upper endpoints A„(e) is defined analogously. 

Note again: K ^ K, we set the error to be oo. 

The comb model in this test has J — % clusters spaced evenly between 1/2 and 10, and a gap in the 
sequence of weights b — 0.01, leading to nearly equal weights. The aspect ratio 7 takes fourf values between 
1/2® and 1/2^. We fix £0 = 10“^ and vary the accuracy e = 10“"*, m = 1,..., 6 . 

2.1.3 Results 

We show the results of the experiment on Figure]^ In panel (a), we show the error in the number of clusters 
Ak{s) for the four different aspect ratios 7, as a function of the accuracy. Spectrode makes at most one 
error in the number of clusters. For sufficiently high accuracy the number of clusters is correct. 

In panel (b), we show the approximation error for the endpoints A;(e) (left), and A„(£) (right), on a 
logarithmic scale. For the experiments where A^(e) > 0, we leave blanks. We observe that for all values of 
7 the approximation gets better with higher accuracy. This convergence appears nearly linear in £: number 
of correct digits is approximately linearly related to — logiQ(£), with a slope of approximately 1/2. These 
experiments provide evidence for our claims Q-©: Spectrode correctly identihes the support, as the 
precision parameter £ —)■ 0. 

2.2 Accurate computation of the density 

We validate our claim ([^ that Spectrode accurately computes the limit density. To test the accuracy up 
to several digits, we rely on examples where the density / can be found exactly in an alternative way. 

2.2.1 Test problems 

For our first test, called MP, the population spectrum R is a point mass at 1. The ESD has the well-known 
density: 


\ a/( 7+ -2;)(a;-7_) 

/(a;; 7 ) = - I{x e [7-,7-h]), (4) 

where 7 ± = (1 ± ^ 7 )^. The distribution of eigenvalues has a point mass at a: = 0 if 7 > 1. 

For the second test, called TwoPoint, H is a mixture of two point masses at x = 1 and t, with weights q 
and 1 — g, then the Silverstein equation ^ for v{z) becomes 

_y ^ _ f g (1 - q)t \ 

v{z) ^\l-|-'c(z) 1 + tv{z) / 
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Figure 3: Accurate computation of the density (Section |2.2[ | in two test problems. Left panel: MP. Right panel: TwoPoint. We 
display the the error in the density A(xi,£:) for three different values of €, 10“^, 10~®, 10~®. 


This is equivalent to a polynomial equation in v of degree at most three: 

ztv^ {ztz 1 — t 7 )p^ -\-[z-\-t-\-l — — q)t))]v + 1 = 0 . 


(5) 


When z,t ^ 0, as is always the case for us, this is a cubic equation in v, which can be solved exactly. The 


theory of Silverstein and Choi 24 guarantees that for real x within the spectrum support of the spectrum, 
Eq. ([^ has exactly one root with positive imaginary part. This is guaranteed to lead to the correct density. 
For real x outside the spectrum, Eq. ^ has three real roots. This distinguishes the inside from the outside. 

For each grid point Xi and accuracy £, Spectrode produces a numerical approximation f{xi,e) to the 
true density f{xi). To test Spectrode, we compute the error in the density: 


A{xi,e) = logio l/(a^od - 

We set 7 = 1/2, and vary the global accuracy parameter e in powers of ten as 10“"^, 10“®, 10” 
addition, for the two-point mixture model we set a fraction g = 1/2 of the eigenvalues to t = 8 . 


( 6 ) 

In 


2.2.2 Results 

The results are shown in Figure For both test problems, the error in the density decreases uniformly 
as the tuning parameter e —0. Furthermore, Spectrode produces approximately the required accuracy: 
for instance the average precision for e = 10“® is approximately eight digits. These experiments provide 
empirical evidence for claim ([^: Spectrode computes the density of the limit ESD with uniform accuracy 
over all x. 


2.3 Computational efficiency 

We now establish that Spectrode is computationally efficient. We compare running times with FPA and 
find that for high precision problems on dense grids, Spectrode significantly outperforms FPA. 

2.3.1 Test problems and parameters 

We use the same test problems, MP and TwoPoint, and the same parameters (J, 7 , g), as in the previous 
section. For a specified set of inputs H, 7 , and accuracy e, Spectrode produces density estimates f{xi,e) 
on a grid Xi i = 1,...,/. We record the running time of the algorithm, defined as the base ten 

logarithm of seconds to completion. Times were measured on an Intel i7 2.4 GHz PC. The relative running 
times are relevant more generally for other systems. We also record the average accuracy in the density, 
defined as: A(£) = — log^o (L^i ~ Here f{xi) is the true density which is available in 

both cases. 


























4 



-log^o(e) -log^o(e) 


4r -1 



2 4 6 2 4 6 


-log^o(e) -log^o(e) 


(a) MP 


(b) TwoPoint 


Figure 4: Running time (base ten logarithm) vs accuracy on two test problems (Section |2.3[ |. We show the log-running time 
(t(£:, 7 ), left subplot in (a) and (b) ) and average accuracy (A(£:), right subplot in (a) and (b)) of the methods as a function 

of the number of correct significant digits k in the precision parameter s = 10“^. Methods: Spectrode - dashed, fixed-point 
(FP), - dash - dotted. 


We repeat this experiment for FPA, which is described later in Algorithm from Section |3.3.3| We 
record the running time H,j) and accuracy Afp(e). To ensure comparability, we use the same grid Xi 
that was produced by Spectrode. We set the accuracy parameter rj to rj = e. We emphasize that rj limits 
the precision due to the smoothing property of Stieltjes transforms (see Lemma 3.11). Therefore, it should 
be of the same order as e to get the desired precision; this motivates our choice rj = s. Further, we apply an 
early stopping rule to the fixed-point algorithm, due to its long running time. For each grid point, we stop 
after 1/e iterations. For this reason the fixed-point algorithm does not always achieve the required accuracy. 


2.3.2 Results 

Figure 1^ shows the results, for MP in the left panel and for TwoPoint in the right panel. The running time 
t{e, H, 7 ) and the accuracy A(e) of the two methods are displayed as a function of the number of signihcant 
digits requested — logjQ(e). 

For the test problem MP in subplot (a), the number of significant digits requested varies from one to five, 
i.e. e = 10“^,..., 10“®. The running time of Spectrode is below 0.5 seconds, regardless of the accuracy 
requested, and produces the required average accuracy. The running time of FPA increases approximately 
linearly in 1 /e, reaching ^ 5000 seconds for e = 10“®. At the same time, the average accuracy is always 
about one digit. In this example Spectrode is faster and more accurate at the same time. Had we stopped 
later, FPA would have taken even longer to converge. 

This result is worth emphasizing: Spectrode is 1000 times faster and 1000 times more accurate than 
the fixed-point algorithm, at least for the highest precision e = 10 “®. 

For the test problem TwoPoint in subplot (b), the number of significant digits requested now varies from 
one to six. The running time of Spectrode is below one second, and it produces the required accuracy. 
The running time of FPA increases as e 0, reaching about ~ 2000 seconds for the largest accuracy. The 
fixed-point algorithm also gives approximately the required accuracy. In this case, Spectrode is faster than 
FPA (by three orders of magnitude for e = 10“®) while producing the same accuracy. These two examples 
show that Spectrode is fast and accurate, and compares favorably to the fixed-point algorithm. 


3 Theoretical Results 

3.1 Background 

In this section we explain our method, and prove its convergence. We start with some background about 
limiting spectral distributions of large covariance matrices. Chapter 7 of Couillet and Debbah’s monograph 
provides a good summary of the material presented here. Recall the model presented in the introduction: 
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X is n X p, of the form X = YSp' , where the entries of Y are iid with mean zero and variance one. We 
take a sequence of such problems with p,n growing to infinity such that p/n —>■ 7 > 0. The population SD 
of the deterministic Sp converges to the limit PSD H. 

The Marchenko-Pastur theorem (see 011 ) states that the empirical SD of the sample covariance matrix 
S = n“^X^X converges almost surely to a distribution F. Denote the imaginary part of 2 ; S C by Imag( 2 :) 
and the upper half of the complex plane by C+ = {z G C : Imag(z) > 0}. If m{z) denotes the Stieltjes 
transform of F, defined for z G C"*" as: 


m{z) 


dF{x) 


(7) 


and v{z) is the companion Stieltjes transform defined on C"*" by the equation 


7 (to(z) + 1 /z) = v{z) + 1 /z, 


( 8 ) 


then it is shown in 24 that v(z) is the unique solution with positive imaginary part of the Silverstein 
equation : 

1 f tdH(t) . 

—TT = / TTTTT’^ ^■ ®) 

v{z) J l+tv{z) 

This equation links the limit PSD H to the limit ESD F. The function v is analytic in the upper half 
of the complex plane. Marchenko and Pastur obtained a more general, but more complicated, form 


of this equation. The present form is due to J.W. Silverstein and appeared in 25 


Our problem is to compute F from FI. For this it is enough to find v{z), and thus m{z), for z on a grid 
close to the real axis. Then, since F has a density /, as shown in 24 , by the inversion formula for Stieltjes 
transforms, we have the limit 


f{x) = — lim lma,g{m{x + ie)}. ( 10 ) 

TT e-fO 

This limit is valid at all points x where the density f{x) exist^ For 7 < 1, the density exists for all x, 
while for 7 > 1 it exists for all x except for a: = 0. In the latter case F has an easily computable point mass 
at x = 0. Numerically it is natural to use the approximation / = Imag{m(a: + ie)}/ t: for some small £ > 0. 
There may be more efficient methods for interpolating m{x + ie), but those are beyond our scope. 

In general there are many solutions to ([^ with non-positive imaginary part. Indeed, for FI a finite 
mixture of point masses, F[ = the Silverstein equation becomes 


1 


v(z) ^ ^^l + tiV{z) 

1—1 

This is generally equivalent to a polynomial equation of degree p 


p 


WiU 


G C+. 


( 11 ) 


roots, compare 


26 


I, and hence it has p -I- 1 complex 
The desired solution will “track” one of the roots as a function of z. However, finding 


the right solution by root tracking is not feasible in general for large p. There does not appear to be a way to 
efficiently compute the coefficents of the polynomial. Indeed, those coefficients involve all symmetric sums 
of the eigenvalues U, and computing these terms seems prohibitively expensive. We will take a different 
approach. 


3.2 Our approach 

We differentiate the fixed-point equation 0 in z, and solve for v'. These steps yield the following ordinary 
differential equation for v: 

% = ■^(^) •= T-= ^0- (12) 

IG (T+i-y)2 

^The result of Silverstein and Choi [24] is stronger. It also states that the density f{x) is the imaginary part of m{x)^ defined 
as the limit of the Stieltjes transform z ^ x\ and the associated v{x) is in fact the solution of the Silverstein equation 
with z = X. While this is in fact an exact equation for the density of the ESD, we will not use it in the current paper. We will 
instead rely on the Silverstein equation on a grid close to the real axis. The reason is that we use FPA as a starting point of 
our method, and FPA is only known to converge for 2 : E C"*", in the interior of the upper halfplane. 














A high-accuracy starting point for the ODE can be found by running the fixed-point algorithm once, at 
a point zq = xq + is near the real axis. Then, the ESD can be computed at other real values x by solving 
the ODE on the line x + is, for the fixed s and varying a; G M. Solving the ODE turns out to be much 
more convenient than solving the original equation repeatedly for each new point x + is. The reason is that 
the limit spectral density is smooth, and the Stieltjes transform provides further smoothing. Our ODE uses 
this smoothness for efficient computation. This is in contrast to FPA, which re-runs the entire fixed-point 
iteration at each nearby point and does not exploit the smoothness. Using the smoothness via the ODE is 
the key inspiration behind our approacf 0 

Since the ODE was obtained by differentiating (111, it has at least one solution. We will show in our 
proof that there is only one solution in the range of interest. Then, once we obtain a numerical solution 
v{x) to the ODE, we could define / directly based on the explicit formulas (|^ and (10). This direct method 
already leads to a relatively good solution which provably converges to the right answer as e —>■ 0 . 

However, for small s this direct method has numerical problems caused by irregularities in the density 
at the edges of the support. Indeed, as shown in Figure [T] the density exhibits a square root behavior at 
the boundary of the support. If implemented naively, these square root irregularities can cause difficulties to 
the ODE solver. We will avoid these difficulties by a more sophisticated method, which hrst finds the edges 
of the spectrum by exploiting the theory of Silverstein and Choi 24 , and later solves the ODE only within 
the support of F. 

In brief, Silverstein and Choihnd the support of the spectrum in the following way. They consider the 
Silverstein equation (H), which defines the companion Stieltjes transform v implicitly as a function of z. 
They observe that the same equation defines a function z{v): 


J 


wdi 


z(v) = -f 7 > :—■ 

V ^ 1 -I- tiV 


(13) 


They prove that the support can be obtained by analyzing the monotonicity of z. Specifically, they show 
that the real intervals v G (a, b) where z{v) is increasing, i.e. z'{v) > 0 , are precisely those, whose image 
under z{v) (i.e. (z(a), z(b))) is the complement of the support of the distribution F. Here F is the limit ESD 
of n“^XX^. Since F is directly related to F, see (14), in theory this is enough to find the support. We will 
exhibit a computationally accurate method to approximate the values of the increasing intervals (a, 5), and 
prove its correctness. 

Algorithmj^below contains pseudocode for Spectrode. The full details are provided in the next sections. 


3.3 Correctness of Spectrode 

Spectrode has an user-adjustable accuracy parameter e > 0. Here we show that as e —)■ 0, the output of 
the algorithm converges to the correct limiting values. 

Spectrode has the following two main steps: 

1. Find the support of the distribution, as a union of compact intervals. 

2. Compute an approximation of the density on the intervals inside the spectrum. 

We will analyze these two parts separately, and give our main results in Theorems |3.3| and |3.6[ We will 
focus on the case 7 < 1 ; the case 7 > 1 is similar and therefore omitted. 


3.3.1 Summary of Silverstein and Choi’s results 


We rely in an essential way on the results of Silverstein and Choi in 24 . For the reader’s convenience, we sum¬ 


marize below the results we need. For any cumulative distribution function G on the real line, define the com¬ 
plement of the support of G, S'^, by 5'^ = {a; G K : there is an open neighborhood N of x, such that G(x) is 
The support of a distribution function G is defined as Sq = R\Sq. The companion distribution function 
F is the limit ESD of n“^XX^, and satisfies 


F = jF + (1 - 7)I[o,oo)- 

®This “spectral ODE” was also the source of the name Spectrode. 


( 14 ) 


constant on A^} 









Algorithm 1 Spectrode: computation of the limit ESD 
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procedure Spectrode 
input 

ti,... ,tj -It- positive eigenvalues 
wi,..., wj weights Wi > 0 , 

7 ^ aspect ratio 7^1 
e •<— precision parameter 

begin 

With accuracy £ > 0, find all intervals {ak,bk) where z{v) (13) is increasing (a^ < bk < Ufe+i) 

Define the support intervals [lk{s),Uk{s)]: 

if 7 < 1 then 

set lk{s) = z(bk) and Uk{s) = z{ak+i) for all fc < J — 1 

else 

set li{e) = z{bj), 4(e) = z{bk-i) for all 2 < fc < J — 1, and Uk{s) = z{ak) for all fc < J — 1 
Set K{e) to the number of intervals [4(e), Ufc(e)] 
for intervals [4(e),Ufc(e)] do 

Approximate by Vk the value Vk = v{tk{s) + iS); 6 = using FPA (Alg[^ with accuracy rj = e. 
Define a uniform grid Ik = Xko < ■ • ■ < XkM = Uk with elements 

Solve the ODE ( [I^ starting at Vk to find the values v{xkj + iS) 

Compute f{xkj,s) = lmag{m{xkj + i6)}/Tr, with rh from (|^ 
return K{e); support intervals [4(e),'Ufc(e)[. Within estimated support intervals, define f{x) by linear 
interpolation. Outside the estimated support define f{x) = 0. 


We recall some claims, some of them stated informally earlier in the paper. 

Lemma 3.1 (Silverstein and Choi [24| ). Let F be the limit ESD of eovariance matrices with limit PSD iJ, 
and aspect ratio 7 < 1. It holds that: 

1. F has a continuous density f{x) for all x. 

2. f is analytic in the neighborhood of any point x such that f{x) > 0. 

3. Let B = {m S M : m 4 0, —G ^h}- Then m belongs to B iff z(m) belongs to Sf and z'(m) > 0. 
This characterizes the support of F and thus also that of F. 

4- Suppose the PSD is an atomic distribution with J point masses. The number of disjoint intervals (a, b) 
in Sp such that a, 5 S Sp is at most J — 1. Therefore the support is the union of at most J disjoint 
compact intervals. 


The following is a restatement of Lemma 6.2 in [^, see also Theorem 7.5 in [^. 

Lemma 3.2 (consequences of 24 , see Lemma 6.2 in [^). Suppose the PSD H is an atomic distribution 
with ti > 0 for all i = 1,..., J, and 7 < 1. Then 


1. F is eompactly supported. 

2. The density f(x) equals zero in some right-neighborhood (0, a) of 0. 

3. Suppose (c, d) is an interval where z'{v) > 0, but z'{c) = z'{d) = 0, and z'(y ) <0 in some neighborhoods 
(c—6, c), (d, d-\-S). Then the interval {z{c), z{d)) forms one connected component of Sp, and all maximal 
compact intervals in Sp have this form. 

4 . z'(v) > 0 for all v G (0,oo). Further, z(v) < 0 for all v € (0,oo). 

5. Define D = —1/min^ tj < 0. There is a finite constant b < D, such that z'{b) = 0, while z'(y) > 0 for 
V G (— 00 , 5 ) and z'(v) < 0 for v G {b,D). Then li = z(b) is the lowest endpoint in the support of F. 








Therefore, the support of F is the union of some intervals [li,Ui], where Q < li < ui < ... < Ik < uk- 
The density /(x) = 0 on (0, Zi], [ui,li+i\ for all z, and [uk,oo). Within the intervals [li,Ui], the density /(x) 
is usually strictly positive. However, there are cases in which the density /(x) = 0 at isolated points within 
[li,Ui\. This can happen if [li,Ui\ arises from a “merge” when two intervals corresponding to neighboring 
population eigenvalues “just touch”. We emphasize that in this case we consider [U^Ui] as one component 
of the support of F. 

3.3.2 Correctness of the support 

Here we explain in detail the steps to find the support of F, and prove their correctness: 

Theorem 3.3. Correctness of K{s), lk{£)i and Uk{s): Consider the numerical approximations K{s), 
Zfc(e), Uk{s) outlined in Algorithm Q and described in detail below. Then, as s 0: 

1. The number of disjoint intervals is correctly identified: lime_,.o Kis) = K. 

2. The endpoints of the support are accurately approximated: lime_,,o4(£) = h, linie_>.o u/c(£) = Uk- 


This theorem is a consequence of Lemmas |3.4| and 3.5 below. 

Our strategy, following Lemmas |3.1| and |3.2[ is to find the intervals where z is increasing; or specifically 
the points where it switches monotonicity. Once we find such a switch point a, we will define z{a) as an 


approximate endpoint. In detail, by Lemma 3.2 claim 5, li = z( 6 ) is the lowest endpoint in the support of 
F, where b is the largest point such that z is increasing on (—oo,5). Therefore, in our algorithm 0, line 4, 
we set the leftmost interval where z is increasing as (ai, 6 i), with ai = —oo, and bi = b, where 6 (e) is the 
numerical estimate of 6 found below. 

The numerical estimate of 6 is found in the following way. Recall D = —l/miiijtj < 0 and consider a 
grid uo < ui < ... < un = B depending on e. Let h{e) be a function such that max^ \ui+i — Ui\ < h{e), and 
1/|mo| < h{e). Find the smallest index i such that z'{ui) < 0 and let 6 (e) = Ui. If there is no such index, 
then let i = N. Define li{s) = z(b(e)) as a numerical approximation of li = z( 6 ). The following lemma 
ensures the convergence of this procedure. 


Lemma 3.4. Correctness of Zi(e): 

lim^^o his) = h- 


Suppose the function hie) has limit lime_>o 6 ,(e) = 0. Then 


Proof. As /i(e) —>■ 0, for sufficiently small e we will have Ug < 6 < ujv. Therefore, for some i it will be true 


that Ui-i < b < Ui. By claim 5 of Lemma 3.2 this will be the smallest index Ui for which z'iui) < 0, and 


hence 6 (e) = Ui. Now by assumption m — Ui-i < hie), hence | 6 (e) — 6 | < hie). 

This shows that as e —>■ 0, we have 6 (e) — 6 . By inspection, z is continuous on (—oo,Z?), hence 
Zi(e) = z( 6 (e)) —>■ z( 6 ) — h, as claimed. □ 

In practice, we choose the grid ui,... ,un using an iterative doubling. We first set it to be an equi-spaced 
grid on [D — 1, D — e] with N{e) = [1/eJ elements. If this grid doesn’t contain an element with z'(ui) > 0, 
we switch to a grid on [D — 2, D — e/2] with 2A^(e) elements. We iterate this process until we find an index 
with z'iui) > 0 and proceed as above. This procedure implicitly defines /i(e) « e, but more general choices 
also work as long as lime_>o 6 ,(e) = 0 . 

We have shown that the numerical approximation to the lowest endpoint of F converges. Similarly, we 


define the remaining endpoints. By Lemma 3.2 claim 4, there is no need to look at the interval (0, oo). Let 
us define a new grid D — yo < yi < ... < yM = 0 depending on e, such that max^ \yi+i — yi\ < hie). Here 
h is again a function such that lime_,.o hie) = 0, and in our implementation we choose 6 ,(e) oc Without 
loss of generality we can assume that the yi, i G 1 ,..., M — 1, are disjoint from the finitely many values 
— ^Itj, so that z, z' are well-defined on the grid; but see the practical note at the end of this section. 

Consider the sign of the sequence zfyj). Assume without loss of generality that the smallest ti is B. 
Clearly for v near D = —1/ti, the dominating term in z' is —wi{tiv / il -\- Bu)}^, and this tends to —oo as 
V ^ D,v > D. Hence, for yi sufficiently close to D, we can assume z'iyi) < 0. The sequence z'iVj) then 
starts out negative. Denote by ii the first index where it switches sign. Define the sequence of grid indices 






*1 < *2 < • • • inductively, with ik+i the first index j > ik where the sign of z'{yj) differs from the sign of 

Then, let us define for all fc, au = Vi^k-n = yi 2 k-i- Thus o/c ,... ,bk are the ranges of indices where 
z' > 0. As described in line 5 of Algorithm Q, set lk{s) = z{bk) for 2 < /c < J — 1 and Ufc(e) = z[ak+i) for 
all fc < J — 1. Finally, set K{e) as the number of intervals {ak,bk) constructed. The lemma below ensures 
the convergence of this procedure. 

Lemma 3.5. Correctness of K(s), (k > 2), and Uk{e)'. Suppose the function h{e) has limit 

limj_>o h{e) = 0. Then the number of intervals is correctly identified in the high-precision limit: lim£_>o K{e) = 
K. Further, in addition to h, the approximations to the other endpoints of the support converge to the right 
answers: lime_>o Ikis) = h for all k > 2, and lim£_,.o Ukis) = Uk for all k. 


Proof. This is analogous to the previous proposition. Suppose (c, d) is an interval where z'(v) > 0, but 
z'{c) = z’{d) = 0, and z'{v) < 0 in some neighborhoods (c — <5, c), {d,d + 6 ). By claim 3 of Lemma 3.2 
the interval (z(c),z(d)) forms one connected component of the complement of the support of F, and all 
maximal compact intervals in Sf have this form. In addition there is an unbounded interval {uk,oo) in the 
complement of F, which is the image of the largest interval (c, 0) on which z' > 0. Therefore we must find all 
increasing intervals, with special attention to the last one. Since we already found li in the previous lemma, 
we exclude the interval {—oo,li]. 

Consider first an interval (c, d) with the properties above, where d < 0. Since the grid spacings |ui+i — 
Ui\ < h{e) —>■ 0, for small e, we will have c — S<yi<c< yi+i for some i. Therefore, z'{yi) < 0 and 
> 0. This shows that the sign of the sequence z'{yj) switches at i + 1. 

Similarly, the sign of the sequence switches at the index j + 1 for which yj < d < yj+i < d + d. For 
sufficiently small e each switch in the signs will correspond to exactly one such interval (c, d), because there 
are finitely many true switches. Hence our algorithm will choose ak = 2 /i+i, bk = yj. 

Then it’s clear that 0 < Ofc — c < yi+i — yiSi h{e). Therefore lim£_>o ak{s) = c. Now by claim 3 of Lemma 
c equals the pre-image under z of the upper endpoint ui of some support interval, i.e. z{c) = ui. Since 


3.2 


each switch in the signs of z'(j/i) corresponds to exactly one interval (c, d), this means that the sorted values 
Cl < di < C 2 < d 2 < ... correspond to the pre-image under z of ui < Z 2 < • ■ •> i-e. z{ci) = Ui, zfdi) = li+i 
for all i. 

This shows that lime_>o Ok+ii^) = z~^{uk) for all A: < AT— 1. It’s easy to see that z is continuous at a^+i, 
because the only points of discontinuity are at the values —l/t^. By continuity limg_,.o 2 :(a/c+i(£:)) = uu, i.e. 
limj_>o Uk{£) = Mfc, for A: < AT — 1, as required. Similarly lim£_,.o lk{s) = h for all k > 2. 

To show that the highest support endpoint converges, i.e., lime_>o uk{£) = uk, we must study the largest 
interval (c, 0) where z' > 0. The reason is that, as it’s easy to see, z' > 0 in a small neighborhood (— 77 , 0 ) 
of zero. Therefore, by Lemma |3.2[ the upper endpoint of the support of F will be the smallest c such that 
z' > 0 on (c, 0). The proof of the convergence in this case is very similar to the analysis presented above, 
and therefore omitted. □ 


Lemmas |3.4| and |3.5| together imply Theorem |3.3[ Our practical implementation of the algorithm is a bit 
more involved. We consider all intervals Ji = (—l/i^, — 1/Ai+i) one-by-one. It is beneficial to break down 
the search for increasing intervals to separate intervals Ji because - as it is easy to see - z{v) has at most one 
increasing interval within each Ji. Further, z{v) has no singularities within any Ji. These properties ensure 
added stability for finding the support. 


3.3.3 Correctness of the density 

In this section we prove the accurate computation of the density: 

Theorem 3.6. Correctness of f(x,e): Consider the numerical approximation to the density f{x,e), 
outlined in Algorithm Q and described in detail below. Then, as £ —)■ 0, the approximation converges to the 
true density uniformly over all x: sup^.^* |/(a:,e) — f{x)\ —>■ 0. 

This theorem is proved in at the end of this section, using the tools developed in in the following lemmas. 
The approximation to the density is defined in several stages, which are outlined below. For the reader’s 
convenience, we provide Table summarizing the notation and definitions for each stage. 
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Table 2: Definitions used in the proof of Theorem 


Name 

Definition 

Defined in 

Analyzed in Lemmas 

Xj{e) 

grid 


16 




f(:r) 

true density 


10 

1 


3.7 


f{xj{e),e) 

solution to exact ODE 


17 



3.8 


f{xj{e),e) 

solution to inexact start ODE 


18 



3.9 

3.10 


f(xj(e),e) 

Euler’s method approximation 

1 

20 

1 


3.7 

3.10 



In the first stage, outside the support intervals [4(e), Mfe(e)] we define f{x, e) = 0. We also set f{ik{e), e) = 
/(ufc(e),e) = 0. Since the approximated support converges, we see that the estimated density converges to 
zero uniformly outside of the support. All that is left is to handle the support intervals. 

Consider a support interval [l,u\, where I < u are the true endpoints of a connected component of the 
support of F. Denote the estimated support by [/, u], and let / = xq < xi < ... < xm = u be a uniformly 
spaced grid Xi = Xi{£) of length M = M(e) depending on e, on which we will approximate the density /(xj). 
In Algorithm Q, we specified M = for concreteness. We will see in the proof that more general 

choices of grids work. For this reason, we will not specify the choice of the grid at the moment, and instead 
only require that the spacings tend to zero: \xi+i — Xi\ < h^e), and lime_>o h{£) = 0. 

Next, we reduce the approximation problem to the grid Xi. As explained in Algorithm Q, for x within 
the estimated support and not necessarily on the grid, define the linear interpolation f[x,£) = af{xi,£) + 
(1 — a)f(xi+i,£), where Xi < x < Xi+i, and x = axi + (1 — a)xi+i. This ensures that the estimated density 
/(•, e) is continuous. With these definitions, we reduce uniform convergence over all x to uniform convergence 
only on the grid. 

Lemma 3.7. To show the convergence in Theorem \3.6[ it is enough to show that the density approximations 
converge uniformly on the grid Xi = Xi{e), that is: 


lim max \f{x^i£),£) - f{x^{e))\ = 0 . 

£->■0 0 < i < M ( e ) 


(15) 


Proof. It is easy to check that by construction, h < h < Ui < Ui for all support intervals. Therefore, we have 
for any x 


f{x,£) - f{x) = 


0 

-J{x) 

f{x,e) - f{x) 


if x ^ [li,Ui], for any i 

if h < X < li, OT Ui < x < Ui for some i 

if li < X < Ui, for some i 


The convergence claim made by Lemma [3.7| is clear in the first case. In the second case, note that there 
are only finitely many support intervals. Therefore it is enough to show lim£_>o |/(a:)| = 0 for all 

i, and the analogous statement for upper endpoints. We showed earlier in Proposition 3.3 that li —>■ k. By 
continuity of /, this shows the desired claim sup^.j.<^<j. |/(a:)| —)■ 0 for the second case. 

The third case is the most interesting one. Consider any x such that U < x < Ui. There are two neighbors 
in the grid such that Xile) < x < Xi^i(£). By the triangle inequality, we can bound 


lf(x,£) - f(x)l < lf(x,£) - f(Xi,£)l + lf(x^,£) - f(Xi)l + If(xi) - /(x)|. 

Recall that the maximum spacing was bounded: |xi+i — Xi| < /i(e). Let us denote a modulus of continuity 
for a function g by oj. This function oj enjoys |g(x) — g{y)\ < a;(|x — y\, g) for any x, y. Taking the maximum 
over all x G Si (where Si = [ii,Ui]) in the previous display, we obtain: 


sup |/(x,e) - /(x)| < u}{h{e),f) + max |/(xi,e) - /(x*)! +a;(h(e),/). 

xeS, 0<i<M(e) 

Since /, / are continuous and compactly suppported, they are uniformly continuous. Therefore, as 
h{e) -G 0, we get uj{h{e),f) -G 0, and similarly for /. Assuming (15), this yields the desired claim: 


lim sup |/(x,e) - /(x)| = 0 . 


□ 























We will now focus on showing the convergence on the grid. The numerical approximation is found using 
an ordinary differential equation, whose starting point is obtained via the hxed-point algorithm (FPA). In 
FPA is presented for a more general class of problems; for the reader’s convenience, we describe the 


special case needed in Algorithm 


Algorithm 2 FPA: Fixed-Point Algorithm 

1: procedure FPA 
2: input 

3: H ^ population eigenvalue distribution 

4: 7 ^ aspect ratio 

5: r] -It- accuracy parameter (> 0) 

6 : z •<— complex argument € C"*". If the argument x is real, then z x + irf 

7; initialize: 

8: Vo ^ -l/z 

9: n <— 0 

10: h(v) := z-jf J^dff(t) 

11: while \1/Vn + h{Vn)\ > v 

12: Vn+1 < - l/HVn) 

13: n ^ n + 1. 

14: end; 

15: nin ^ - l)/z 

16: return v{z,ri) = u„; f{x,r]) = Imag(m„)/ 7 r, where x = Re( 2 ;) 


The fixed-point algorithm is a method for solving the Silverstein equation. It is based on the observation 
that 0 is a fixed-point equation v = —l/h{v,z), for any given z. Then one defines a starting point 
Vo = —^/z, and iterates Vn+i = —l/h(u„;z) until the convergence criterion |l/i;„ -I- h(u„)| < rj is met. 
Let v{z,r]) be the solution produced by the fixed-point algorithm. It was shown in [13| that, for any fixed 
z € C"*", v(z,rj) converges to the unique solution of the Silverstein equation (11) with positive imaginary 
part, as ?7 —>■ 0 : v{z,r]) —>■ v{z). 

The accuracy parameter r] in the fixed-point algorithm is important. If the algorithm is used to approx¬ 
imate the density of the ESD at x, with accuracy 77 , as in the section comparing the different methods, we 
run FPA at z = x -I- The scaling 77 ^ is motivated by Lemma 3.11 The proof of this lemma shows that 
the true solution at imaginary part e guarantees an accuracy Therefore, to get accuracy 77 , we go down 
to imaginary part 77 ^. Further, we use the same threshold 77 in the stopping criterion. In principle, these two 
parameters could be decoupled, but this simple choice suffices for our purposes. 

Therefore, to find the starting point, we use FPA for z = I iS{s), where I is the approximation to the 
lower grid endpoint, and i5(e) = The accuracy parameter 77 is set t o 77 = e. This gives a starting point 
Vo = v{l + i6{s)). For simplicity, we will first analyze, in Lemma 
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the case of an exact starting point 
Vo = v(l + i6{e)). That is, we argue about the case when the solution of the Silverstein equation has been 
found exactly by FPA. In Lemma we extend the argument to the inexact starting point vo- 

We call the exact ODE Eq. (12) on the interval [^u], started at the true solution vq = v{l -|-Z(5(e)). The 
exact ODE has the right solution: 


Lemma 3.8. Correctness of the exact ODE: 

r of a real variable x 


Consider the ODE (12) for the complex-valued function 


dr 

dx 


1 


1 

'y 2^1=1 (l+t;r )2 


r(xo) = Vo- 


Let the starting point be the exact solution vq = v(xo -j- i6(£)). Then this equation has a unique solution on 
[xo, It], and this solution is r(x) = v{x -I- 7 ( 5 (e)). 


Proof The ODE was obtained by differentiating the Silverstein equation 0 for V. Since r(x) = 7 ;(x-|-z< 5 (e)) 
obeys the Silverstein equation and also satishes the starting condition, it is clearly a solution. 

To show that the solution is unique, we appeal to basic results in ordinary differential equations. Specif¬ 
ically, consider the general ODE y' = g{y), y{xo) = yo- It is well known (e.g. Theorem 7.4 on pp. 39 of the 

















monograph by Hairer and Norsett 27 ) that the solution is unique on the open set {x,y) G U cM.xC where 
g,g' are continuous. If started at any point (xo,yo) G U, the solution can be continued to the boundary of 

U. 

In our case g{r) is continuous on the entire image v{C~^) = {y : y = v{z), for some z G C+l. Indeed, let 
2/0 be an arbitrary element of r)(C^), so that 2/0 = v(zo) for some zg G C"*". Then by the definition of the 
ODE, v'(zo) = g{yo)- Now v is analytic on C+, so clearly v'{z) is well-defined and continuous at zg- By the 
expression for v'{z), v'{z) = l/k{v) for some complex function fc, we see that v'{z) 7 ^ 0. Hence, by the inverse 
function theorem, v is invertible near yg, so that locally z = v~^{y) in a neighborhood of yg. Therefore, 
locally near yg, g{y) = v'{v~^{y)). This shows that g is continuous near yg. Hence, g{y) is continuous on 
the entire image z;(C"'"). By a similar argument g'(i/) is continuous on the entire image v (C+). 

This shows that for our problem U contains K x u(C“''). Clearly we start at a point {xg, vg) in U. By the 
result cited above, the solution to the ODE is unique on the entire set x S K, and in particular on [xo,m], 
finishing the proof. □ 

Next, we will argue that even with an inexact starting point for the ODE, the solutions are still nearly 
exact. Suppose that FPA produces an estimate vg = v{z,rf) of vg. We call the ODE (12) started at vg the 


inexact start ODE. The difference c(e) = Vg — Vg can be made arbitrarily small by taking 77 sufficiently close 
to zero in FPA. The following lemma ensures that the solution to the inexact start ODE stays close to the 
true solution. 


Lemma 3.9. Correctness of the inexact start ODE: Consider the ODE (12) as given in Lemma 3.8 
but started at iio = I'd + c(e), where Vg is the solution produced by EPA for starting point z = I + i6{e) and 
sufficiently small rj. Then, for sufficiently small e, this inexact start ODE has a unique solution r on [l,u\, 
which obeys |r(x) — r{x)\ = 0{c{e)) uniformly over all x £ [^u]. 

Proof. First we will show the uniqueness of the solution. By Proposition 1 of [13| , the fixed-point Algorithm 
[^started at z = f -I- iS{e), and with accuracy 77 , produces a solution fio = ^’(•2, g) such that v{z, 77 ) —)■ v{z) as 
77 —> 0. Therefore, for sufficiently small 77 , we can ensure that vg — vg = 0(c(e)) for an arbitary small c(£). 

By the open mapping theorem, is an open set containing the true solution curve r(x) = v(x+iS(e)). 

Hence, for sufficiently small c, vg G v(C~^). Hence, vg = v(a + ib) for some a and b > 0. Then, by the same 


argument as in Lemma 3.8 the solution f exists and is unique, and is given by r(x) = v((x — xg) + a + ib). 


This solution exists for all x and belongs to 7;(C“''). 

Next, we will use a general inequality for inexact start ODEs for the quantitative bound. The ‘fundamen¬ 
tal lemma’ (Theorem 10.2 of [^, pp 58) states: Consider the ODE y' — g(y), and let y,y be two solutions 
with starting points y(xg), y(xg). Suppose \g'{y)\ < L on a connected set Kg containing the two solution 
curves y,y for x G [xo,xm]- Then the solutions y,y are close to each other for all x G [xg,XM\, specifically: 
\y{x) - y{x)\ < \y{xg) - y{xg)\exp{{x - Xg)L). 

We have shown in the proof of Lemma 3.8 that f'{y) is continuous in the image v{C^). Let Kg be a 
compact connected subset of 7;{C''') containing the solution curves r{x),f{x) for x G [Z,u] (which exist by 
the above argument). Then f' is bounded by some constant L on Kg. By the fundamental lemma, we have 
|f(x) — r{x)\ < c(e) exp((M — l)L) = 0{c{e)), as required. This finishes the lemma. □ 

Next we introduce notations for the solutions of the two ODEs. As we discussed earlier, the grid Xi = Xi(e) 
is uniformly spaced on [l,u\: 

I = Xg < ... < xm = u, (16) 


The solutions to the exact ODE (12) in Lemma 3.8 on the grid are v{xj + iS(e)). We then define 
m(xj + iS{e)) according to 


and 


f{xj{e),e) = Imag( 777 (xj -|- 7 i 5 (e)))/ 7 r. 

Similarly, with the solutions v{xj + iS{e)) to the inexact start ODE analyzed in Lemma 
grid define rh{xj + iS{e)) accordingly, using (|^, and 


3.9 


(17) 

on the same 


f{xj{s),s) = Imag(7Ti(xj -|-7i5(e)))/7r. 


(18) 











Lemma [3.91 shows that 


\f{xj{e),e) - f{xj{e),e)\ = 0(c(e)). 


(19) 


In particular, this bound highlights that the approximation / is uniformly accurate over the grid Xj, as 
required by Lemma |3.7| 

We show next that Euler’s method for discretizing the inexact start ODE on the grid Xi produces 
numerical approximations that converge as e —> 0. In practice we use a higher order ODE solver, but for 
simplicity here we consider Euler’s method. 


Let Vj be the sequence produced by Euler’s method for the inexact start ODE (121 on the discretization 
I = xq < ... < xm = u. Define rhj = + ( 7 “^ — where Zj = Xj + iS{e). Also define the density 

approximations 

f{xj,e) = Imag(mj)/7r. (20) 

Then we have the following: 

Lemma 3.10. Euler’s method approximates the true solution to the inexact start ODE: Consider 
a fixed £ > 0, and suppose that max^ \xi+i — Xi\ < h. Then \ f{xj{e),e) — f{xj{£),e)\ = 0{h) for sufficiently 
small h. 

Proof. This lemma is a direct consequence of the well-known error estimate for Euler’s method. Consider 
the general ODE y' = g{y), y{xo) = yo- Theorem 7.5 on pp. 40 of states that the bounds l^l < A, 
l^'l < L in a neighborhood of the solution imply that the Euler polygons yh{x), on a grid Xi with maximum 
spacing at most h, obey \yh{x) — y{x)\ < A(exp(L(a: — xq)) — l)h. In lemmas 3.8 3.9 we have shown for 


the inexact start ODE (12) started at vq that g,g' are continuous in a neighborhood of the solution, so the 

therefore the exponential term is bounded. We 

□ 


bounds A, L exist. We only consider the finite interval [1, u] 
get \yh{x) - y{x) \ = 0{h), as required. 


If we take h = h{e) —>■ 0 as £ —)■ 0, then the above lemma shows that \f{xj{£),£) — f{xj{e),e)\ —>■ 0 


uniformly over all Xj{e). Comparing with Lemma |3.7| and Lemma 3.9 (specifically bound ( |19[ )), all that 
remains to show for our main result is that the true density / is well approximated by the Stieltjes-smoothed 
density /(•,£), i.e.: f{xj{£),s) — f{xj{e)) 0. Since the Xji^e) depend on £, it is necessary to show that 
/(x,£) — f{x) —^ 0 uniformly in x, where recall that f{x,£) = Imag(m(x -I- iS^e)))/^. 

Lemma 3.11. Approximation of a density by its Stieltjes transform: Let f be a bounded probability 
density function. Denote by m{z) its Stieltjes transform: m{z) = f f{x)/(x — z)dx. Suppose f is uniformly 
continuous. Then as e ^ 0; 

sup —Imag(m(x -I- ie)) — f{x) —>■ 0, 
sGR 'X 


Proof. A simple calculation reveals 


1 If 

f{x,e) := —lmag{m{x + is)) = — 

TT TT J 


sfjt) dt 
{t — x)^ + £^ 


Clearly 


1 

1 = - 


edt 


therefore we can define g such that 

f{x,s) - fix) = ^ J 


TT J {t — x)^ + ' 

- f{x))dt 


( 21 ) 


{t — x)2 -I- £2 


= J git) dt. 


We will bound this by breaking it down into two integrals: one near x and another far from x. Let t] > 0 
be the parameter determining the split, to be chosen later. We bound first the integral of g near x: 


|i—a;|<77 


git) dt 


< 


edt 


'x J\t-x\<-n it-x)^+£^ 


sup |/(t) -/(x)| < w(77,/) 

i:|t—rE|<77 
















Above we used (21) to upper bound the integral term; and we introduced w, a modulus of continuity for 


/. This gives a bound for the integral of g near x. 
Next we bound the integral away from x: 


'\t—x\>r] 


git) dt 


<2|/|c 


edt 


'tt J\t-x\>7j (t-a;)2+e2 


The integral term in the upper bound can be evaluated explicitly as tt — 2 arctan(77/e) 
above by Tre/ry. 

Putting everything together, we find the bound 


and can be bounded 


\fix,e) - /(x)| < ujigJ) + 


2|/U 

V 


Choosing g = e°‘ with a G (0,1) yields a bound that tends to zero uniformly over x, when e —>• 0. The 
modulus of continuity tends to zero because / is uniformly continuous. Thus we have shown the desired 
claim and finished the proof. 

Note that, if / is continuously differentiable in the neighbrhood of a point x, then we obtain an optimal 
bound on |/(x,£) — /(x)| by taking g = This guarantees |/(x,£) — /(x)| = The square 

root scaling in e motivates us to work on the line with imaginary part 5 = throughout the paper, to get 
accuracy of order s. □ 

The previous results can be put together to prove Theorem |3.6| 


of Theorem 3.6 We shall show the uniform convergence of the density approximation /. By lemma 


3.7 


we 


only need to show the uniform convergence on the grid Xiie) within the support intervals. Let [/,m] be such 
an interval, let [Z,m] be the approximation produced by Spectrode for some £. Also Xiie ) is the uniformly 
spaced grid I = X g < Xi < ... < xm = m of length M = . 


Lemma 3.10 is applicable to this grid, because the grid width scales as cx £^/^ —)■ 0. By this lemma 


maxj \ fixjie),e) - fixjis),s)\ 0. 

Further, Lemma |3.9| (specifically bound applies if £ is sufficiently small; and if for fixed £, the 

accuracy parameter g = gis) in the fixed-point algorithm is sufficiently small. By bound (19), and because 
c(e) 0, we get maxj \fixj{s),e) — f{xj{e),e)\ 0. 

On the othe r hand, / is continuous and compactly supported, hence uniformly continuous. Therefore, 
by Lemma 3.11 sup^, |/(x,£) — /{x)| —>■ 0. Putting everything together: max^ |/(xj(£),£) — f{xjie))\ 0. 


This precisely fulfills the hypotheses of Lemma |3.7[ and that lemma gives the desired claim. 


□ 


3.4 Non-atomic measures 


Our algorithm makes sense for general non-atomic limit PSD-s. Indeed, for general PSD H the ODE (12) 
takes the form 

^ = J^{v) := --r7£77r777T> = Pq- 


dx 


1 _^ f dH(t) 


{l + tv) 


This ODE can be implemented and solved efficiently as long as the integral in the denominator is con¬ 
venient to compute. Lemma 3.1 provides the means to find the support, and the fixed-point algorithm 


converges to a starting point even at this level of generality. Therefore the ODE approach is more generally 
applicable than this paper’s restriction to atomic measures. 

In our current implementation of Spectrode, we go beyond mixtures of point masses and also allow 
mixtures of uniform distributions, so H = WiSt- + where Ua,b is a uniform distribution 

on [a,5]. An example was shown in Figure^ To efficiently support uniform distributions, we compute in 
closed form the integrals that appear in the FPA iteration in the function h{v) in algorithm and in the 
ODE. By linearity of the integral, we only need to do the calculation for the individual uniform distributions. 
We use the formulas: 


tdUa,bit) 


loe 

au-l-l 


t^ dUafiit) 


1 


2iogk!^ 


av-\-l 


1 

hv-\-l 


av-\-l 


l + tv V {b—a)v^’ (1 + 


(6 — a)v^ 






















Armed with these, we obtain efficient computation with arbitrary finite mixtures of uniform distributions. 

In addition, a large part of the analysis holds true. Specifically, the convergence of FPA and the analysis 
of the ODE do not use the atomic structure directly. Instead, the atomic structure of the PSD is used 
through the structure of the support of the ESD E as a a union of compact intervals; and the behavior of z' 
characterizing the support (claim 3 of Lemma 3.1). 

To our knowledge, these claims are currently not known to hold for more general PSD. Indeed, in the 
very recent related work [28| examining the fluctuations of the spectrum at the edges of the support, the 
authors work conditionally, assuming that the edges are regular in a certain sense. Extending the validity 
of our algorithm would presumably require developing a better understanding of the support of ESDs for 
general PSDs. This could be an interesting direction for future research. 


4 Applications 

In this section we apply Spectrode to compute moments of the limit ESD and contour integrals of its 
Stieltjes transform. 


4.1 Moments of the ESD 

The uniform convergence of the approximated density allows us to compute nearly arbitrary moments of the 
ESD. These moments have many applications, see 0[li- 

Obtaining the moments of the ESD is in general difficult. The polynomial moments of the ESD 

can be computed using free probability. However, there appear to be no general rules for calculating more 
general moments such as E^log(A) or Pf(A < c). In contrast, they can be computed conveniently with 
our method. 


Corollary 4.1. Let 7 < 1, and /i : M —> M 6e bounded on compaet intervals and Rieniann-integrable on 
eompact intervals. Then the integral of h computed against the density approximation f{‘,s) produced by 
Spectrode converges to the moment of h under the limit ESD F: 


lim 
6 —^0 


h{x)f{x, e) dx 


h(x)f{x) dx. 


The same holds for 7 > I if we account for the point mass at x = 0. 


Proof. Let M be an arbitrary upper bound on the support of F. 


outside [0,M], and so is / by Theorem 


3.6 


Therefore 


Then for sufficiently small e, / is zero 


Kx) (/(a;,e) - /(x)) dx 


rM 


< / \h{x)\ dx sup |/(x, 

Jo a;G[0,M] 


e) - fix)\ 0. 


The convergence to zero follows because the first term is bounded by the assumptions on h, the second term 
tends to zero by Theorem 3.6 The case 7 > 1 is completely analogous. □ 


It is also possible to prove the convergence of Riemann sums h{xi) f{xi,s) A(xj), but this will not 
be pursued here. 

As an example, in Table we show the results of computing three moments of the standard MP dis¬ 
tribution, with 7 = 1/2. The three functions are h{x) = x, log(x), and log^(x). The true value of the 
expectation for x is 1, for log(x) is log(2) — 1 (see e.g. 0), while for log^(x) it is not easily available in 
the standard references on the subject. The numerical values computed with Spectrode, for a precision 
parameter e = 10“®, have a very good accuracy in the case when the true answer is available. In addition, 
Spectrode also computes the integral of log^(x). 







Table 3: Moments of standard MP law with 7 = 1/2. 


Function 

True Value 

Numerical Value 

Accuracy 

X 

1 

1 

2.3308e-06 

log(x) 

-0.30685 

-0.30684 

1.4483e-05 

log^(x) 

unknown 

0.81724 



4.2 Contour integrals of the Stieltjes transform of the ESD 


Spectrode can be adapted to compute contour integrals involving the Stieltjes transform of the limit 
ESD. Such integrals appear in Bai and Silverstein’s central limit theorem for linear spectral statistics of the 


covariance matrix 20 


Let r be a smooth contour in the complex plane that does not intersect the support of the limit ESD F. 
Let c : [0,1] —)■ C be a parametrization of the contour, and v{z) be the Stieltjes transform of F. Note that 
v{z) can be defined by the same formulas 0-® for all z outside the support of F, and in particular for all 
z G r. 

Suppose that for a smooth function G, we want to calculate the following integral over the clockwise 
oriented contour L: 

X = G{z,v{z))dz. (22) 

An example is the mean in the CLT for linear spectral statistics g{\) of E (for a smooth function g), 


which involves the formula (see 20 




f/ 


v{z)^t‘^ dH{i) 

(l+tD(j2))3 


r v(z)^t'^ dHit) 

I-7J y + tt ,(.))2 


dz. 


(23) 


This is a special case of our general problem (22). To compute the general integral I, we perform the 


change of variables z = c(t), and rewrite X in the form X = G{c{t),v{c{t))}c'(t) dt. We assume G(-), 

c{t) and c'(t) are conveniently computable. Then the key problem in approximating this integral is obtaining 
n(c(<)) for the entire range t G [0,1]. This is where the ideas used in Spectrode will help. 

We will obtain h{t) := v{c{t)) by exhibiting an ODE for it. Specifically, by using the chain rule h'{t) = 


v'{c{t))c'(t), and recalling the ODE (12), which states v'{z) = F{v{z)), we get the new ODE for h: h'{t) = 

F{hyit). 

The starting point h{0) (i.e. '(;(c(0))) can again be found using the fixed-point algorithm, provided the 
starting point of the curve, c(0) has nonzero imaginary part. Indeed, this follows from the general properties 
of FPA if the imaginary part of c(0) is positive. Moreover, the Stieltjes transform enjoys v{z) = v(z) (z 
denotes complex conjugation), so FPA also converges for z with negative imaginary part. Now, if the curve 
r lies entirely on the real line, then a starting point be obtained using the function (13), which becomes the 
explicit inverse of the map c —>■ v(c), as shown in 24 . 


The new ODE can be integrated numerically. The obtained values for h can be used to approximate the 
contour integral X using standard quadrature methods. 


4.2.1 Example 

In Table we show an example. Consider the sample covariance matrix S = where Xij are real 

random variables with Ea;^- = 0, Exf^ = 1, and Ex^^ = 3. Let be the eigenvalues of the sample covariance 
matrix, and let F^ be the standard MP law with index 7. Consider a sequence of such problems with 
n,p ^ 00, 7p := p/n 7. Let Fp be discrete spectral distribution of S. For a function g that is analytic in 
a neighborhood of the support of Fy, let Xp{g) be the linear spectral statistic Xp{g) := p {Fp{g) — F^^{g)} = 

J2Li 5(^*) - [ 5 (A)]. 

Then Bai and Silverstein [20| proved that Xp{g) is asymptotically normal with mean given in ( |23[ ), with 
H = Si and P an arbitrary contour enclosing the support of the ESD. It is known (see for instance M) that: 

J{x,Si,'l), J{\ogx,6i,'y) = ilog(l-7). 





















X. 


Figure 5: Computational accuracy of the Monte Carlo method, discussed in Section]^ Here we sample umc = 1000 independent 
random matrices with iid Gaussian entries and aspect ratio 7 = 1/2. We fit a kernel density estimator to the histogram of 
eigenvalues of each sample, and average over independent samples. We display the pointwise error of approximation for p in 
the range ten, 10 ^, 10 ^ 


We use our method, outlined above, to compute the integral ( p^ . We take 7 = 1/2 and compare against 
the closed form solutions for g{x) = x and g{x) = log(a;). We also compute the integral for g{x) = log^(a;), 
for which no closed form solution appears to be known. The results, displayed below in Table show the 
excellent performance of our method. 

Table 4: Mean of normalized LSS for identity covariance. 


Linear Statistic 

True Value 

Numerical Value 

Accuracy 

X 

0 

-7.1292e-12 

7.1292e-12 

log{x) 

-0.34657 

-0.34655 

2.2214e-05 

log^ (x) 

unknown 

1.2111 



In this experiment, we used the circle contour c{t) = a/2 + a/2 • with o = 1.1 • (1 + This 

contour encloses the support of the ESD. The starting point of the ODE, u(c(0)) = v{a), was found using the 
function z(v) (13). More specifically, using the default bisection method in Matlab, we found numerically 
the unique solution to the equation z{v) = o on the interval v G (—1, 0) such that z'(v) > 0. As discussed in 
Sectionthis guarantees the correctness of the starting point. 


5 Related Work 

Problems related to computing the ESD of covariance matrices have been discussed in several important 
works. Here we examine the strengths and weaknesses of related and alternative methods. 


5.1 Monte Carlo 


Monte Carlo (MC) simulation can be used to approximate the eigenvalue density of large covariance matrices 
via a smoothed empirical histogram of eigenvalues. It was proved in 29 that this method consistently 


estimates the ESD. However, we show in a simple simulation that MC is prohibitively slow when more than 
three digits of accuracy is required. 


5.1.1 Experiment setup and parameters 

We use the MP test problem when the covariance matrix Sp = Ip and H = 6i. For a specified pair n,p we 
sample random matrices X with iid standard normal entries A/'(0,1). We compute the empirical eigenvalues of 
the sample covariance matrix S = X^X/n. We fit a kernel density estimate to the histogram of eigenvalues, 
using an Epanechnikov kernel with automatically chosen kernel width in Matlab. Finally, the kernel density 

















estimate is averaged over the umc independent Monte Carlo trials to get a final estimate fucixi) of the 
density. We use the following parameters: umc = 1000, 7 = 1/2, and p takes the values 10,10^, 10^. 

We compare against the the true limit / from Eq. Q and report the error in the density from Eq. §: 
AMc(a::i) = logio Ifucixi) - f(xi)\. 


5.1.2 Results 


In the results in Eigurej^ we see that number of correct significant digits is two or three. We get only 1/2 
extra digit when we move from p = 100 to p = 1000. The experiment for p = 1000 takes about ten minutes 

Computing the eigenvalues using the SVD takes 0(p^) steps. 

1600 hours, which 


on the same hardware described in Section 


2.3 


Eor ten times larger p = lO'^, such an experiment would take about 10® minutes, or cca. 
is prohibitively slow. 

Based on this experiment, we conclude that the MC simulation for computing the ESD, if implemented 
in a straightforward way, is not suitable for getting more than three digits of accuracy. 


5.2 Method of Successive Approximation 

While Marchenko and Pastur do not place emphasis on numerically solving the Marchenko-Pastur equa¬ 
tion, they mention that its solution can be found by the method of successive approximation (SA). SA was 
also proposed by Girko 30 for several variants of the Marchenko-Pastur equation. We will argue that SA is 


not an efficient method for computing the ESD. 

Marchenko and Pastur in consider a more general model than this paper, allowing for an additive 
term Y = A -|- n“^X^TX. They denote by r(^) : [0,1] —>■ K the quantile function of the PSD H, which is 
the limit of the spectrum of T; and c = limp/n, which was called 7 in our paper. Also, they call m.o(z) the 
Stieltjes transform of the limit spectrum of A, writing the equation for the Stieltjes transform of the ESD 
of Y in the form (see equation (1.13) in the English translation of their paper): 


u{z, t) = mo ^z — c J 




(24) 


1 -br(^)u(z,t). 

The unknown function is u{z,t). They show that, with the initial condition u(z, 0) = mniz), there is a 


unique solution u{z,t) analytic in z G C"*" and continuous in t S [0,1] of (24). Then, u(z, 1) is the Stieltjes 
transform mpiz) of the limit ESD F. 

In our special case A = 0, somo(z) = —1/z and the equation simplifies to u(z, t) = —l/(z—c 

Denoting v(z) := u(z, 1), and switching from the integral over t G [0,1] to the integral against dH, we 
see that this reduces to the Silverstein equation (|^. 

The method of successive approximation starts with an arbitrary function uo(z, t) obeying the smoothness 
and continuity conditions above. It defines the sequence of functions Un{z,t) inductively for n G N by: 


Un+i{z,t) = mo 



T{^)Un{z,t)) 


(25) 


For this method it is crucial that one maintains bivariate functions Un{z,t). The definitition of 
depends on the integral of Un against t, even if we are interested only in the values tt„+i(z, 1), for t = 1 . 
Therefore, the method of successive approximation relies on an additional time dimension. However, this 
extra dimension seems costly in the special case when A = 0. There are more efficient methods, such as 
FPA and Spectrode, that do not rely on additional dimensions. 


5.3 Fixed-Point Algorithm (FPA) 

FPA appears to be the standard approach that most researchers use to compute the ESD or sample covariance 
matrices in the Marchenko-Pastur asymptotic regime. Versions of FPA have been developed in many different 
areas, including free probability, wireless communications, signal processing, and mathematical statistics. 
The existing techniques for analyzing it fall into the following categories: (1) complex analytic method; (2) 
contraction by deterministic equivalents of random matrices; and (3) interference functions. 








Balinschi and Bercovici in 11 explain subordination results in free probability, with the Marchenko- 


Pastur-Silverstein equation as a special case. They employ a complex analytic approach, which involves the 
study of Denjoy-Wolff points. As a special case of their results, it follows that FPA converges. This was not 
explicitly stated by the authors, but it is indeed an immediate consequence of their results. 

In many random matrix models, the fixed-point algorithm is often invoked implicitly, to show the unique¬ 
ness of the solution to the fixed-point equation governing the spectrum. For instance, this is the approach 
in the analysis of deterministic equivalents for random matrices in 12 . Here the authors show uniqueness 


of solutions to their equations by exhibiting bounds on the size of successive iterates, which is equivalent to 
the convergence of the fixed point algorithm in that context. 

In the wireless communications literature, explicit fixed-point algorithms have been developed for com¬ 
puting the ESDs of very general random matrix models in several papers: Elillll- The authors in 
give a fixed point algorithm for a random matrix model that is a sum of arbitrary covariance matrices. They 
prove its convergence by showing that the iteration is a contraction for complex points z with large Imag(z), 
similarly to the earlier work 12 . Then the convergence extends to all points outside the support of the ESD 


using Vitali’s theorem. 

In contrast, the authors in [31] take a different approach to proving the convergence of FPA. They show 
that for negative arguments z S (—oo, 0], their equations are fixed points of interference functions, and they 

Again, Vitali’s theorem extends the convergence 
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rely on general convergence results of such functions 
to other points. In the model of the convergence of FPA cannot be established for all complex numbers; 
which is a counterexample showing that FPA is not expected to converge in all circumstances. However, the 
interference function approach for proving convergence of FPA is powerful and general, see for instance 33 

FPA has appeared in other papers as well. 
problem, but without convergence guarantees. Hendrikse and his coauthors 


has a fixed point algorithm for a time series 
discover the fixed-point 


15 


algorithm for the limit ESD of sample covariance matrices, and claim to prove convergence for the argument 
z with sufficiently large imaginary part Imag(z) > c, using elementary arguments. However, their arguments 
appear to be incomplete; as they do not show that the iterates z* remain in the region Imag(z) > c for all t. 

In a free probability context, Belinschi, Mai and Speicher in the paper |35j generalize the fixed-point 
algorithm to compute the limit ESD of arbitrary polynomials P(V„i,..., given the limit ESD of random 

matrices Xni, ..,Xnk- This important paper has the advantage of generality, and in principle can subsume 
other methods. As we showed, however, for our problem FPA can unfortunately be slow for high-precision 
computations. 


5.4 Other Methods 


Special cases of limit ESDs have been computed in a case-by-case fashion, 
of wireless networks, 
polynomial equation. 
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For instance, in the analysis 
develop a computational procedure for a special case that involves a fourth-order 


Further, Nadakuditi and Edelman in the influential work 26 developed a polynomial method as a general 


framework for computing limit spectra of ensembles whose Stieltjes transform is algebraic. As noted by the 
authors (see their Section 7), these methods in general do not lead to an automated way to compute the 
limit density. Indeed, this is presented as an open problem in 26 . Spectrode addresses a narrower setting 


and shows that the ESD can be computed reliably in that setting. 


Olver and Nadakuditi in 37 present an interesting approach for calculating the additive, multiplicative 
and compressive convolution in free probability. The map from population to sample spectra H ^ F 
corresponds to free multiplicative convolution with the identity Marchenko-Pastur distribution. However, 
this method is not generally applicable to our problem. It requires that the support of the LSD F be precisely 
one compact interval, because it relies on specific series expansions (see their Section 4). In our case of this is 
often not the case. We see Spectrode as complementary to their method, for the case of multiple intervals 
in the support. 














6 Software 


A software companion for this paper is available at https://github.com/dobriban/eigenedge. It contains 
implementations of 

1. methods to compute the sample spectrum: Spectrode and the fixed-point method 

2. methods to compute arbitrary moments and quantiles of the ESD 

3. Matlab scripts to reproduce all computational results of this paper 

4. detailed documentation with examples 

The package is user-friendly. Once the appropriate environment is installed, the ESD of a uniform mixture 
of four point masses at t = [1; 2; 3; 4], and with aspect ratio gcunma = 1/2 requires three lines of code: 

t = [1; 2; 3; 4] ; 

gamma = 1/2; 

[grid, density] = spectrodeCt, gamma); “/compute limit ESD 
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